Main Theme: AI and Society
Hands-On Introduction to Recommender Systems with R
Audience: High school students new to R
Duration: 2 hours
Environment: R and RStudio
Theme: AI and Society
KT Wong
kwanto@hku.hk
Faculty of Social Sciences, HKU
2025-07-31
Hands-On Introduction to Recommender Systems with R
Audience: High school students new to R
Duration: 2 hours
Environment: R and RStudio
this workshop introduces recommender systems using R in RStudio
Task: Analyze user ratings for AI-related products (e.g. apps, tools)
Roadmap
Learning Goals:
What are Recommender Systems?
How They Work:
e.g. If you buy a product (book), get similar product (book) suggestions
Focus on the working of collaborative filtering, introduce content-based briefly when time allows
Objective: Set up RStudio
File > New File > R ScriptObjective: Load packages and dataset
library(tidyverse)
library(recommenderlab)
library(proxy)
ratings <- matrix(sample(c(NA, 1:5), 300, replace = TRUE, prob = c(0.7, 0.1, 0.1, 0.1, 0.05, 0.05)),
nrow = 30, ncol = 10)
colnames(ratings) <- c("StudyApp", "ChatBot", "ArtGen", "CodeTool", "VoiceAI",
"HealthAI", "GameAI", "MusicAI", "CarAI", "TutorAI")
rownames(ratings) <- paste0("User", 1:30)
ratings <- as(ratings, "realRatingMatrix")Task 2.1: Run code (Ctrl+Enter)
Task 2.2: take a look at the ratings matrix
View(as(ratings, "matrix"))Objective: Understand dataset structure
the dateset is a matrix of user ratings for AI products
Task 2.3: Check dimensions and first few ratings
dim(ratings)colnames(ratings)Objective: Create user-based collaborative filtering model
Objective: Predict recommendations for users
Objective: Compare with content-based filtering
# Product features (simplified: 1 = has feature, 0 = doesn’t)
features <- matrix(c(
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, # StudyApp: Education, Coding
0, 1, 0, 0, 1, 0, 0, 0, 0, 0, # ChatBot: Interaction
0, 0, 1, 0, 0, 0, 1, 1, 0, 0, # ArtGen: Creativity, Gaming
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, # CodeTool: Education, Coding
0, 1, 0, 0, 1, 0, 0, 0, 0, 0, # VoiceAI: Interaction
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, # HealthAI: Health
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, # GameAI: Creativity, Gaming
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, # MusicAI: Creativity
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, # CarAI: Driving
1, 0, 0, 0, 0, 0, 0, 0, 0, 1 # TutorAI: Education
),
nrow = 10, byrow = TRUE)
colnames(features) <- c("Education", "Interaction", "Creativity", "Coding",
"Voice", "Health", "Gaming", "Music", "Driving", "Tutoring")
rownames(features) <- colnames(ratings)
# Cosine similarity for content-based recommendations
sim_matrix <- as.matrix(simil(features, method = "cosine"))
content_recs <- apply(sim_matrix, 1, function(x) names(sort(x, decreasing = TRUE)[2]))content_recs["StudyApp"]content_recs["ChatBot"]Objective: Assess recommendation quality
as(pred_test, "list")[[1]]as(test, "matrix")[1, ]Objective: Visualize recommendations, compare methods, discuss societal impact
Plot top recommendations; compare collaborative vs. content-based
Collaborative Filtering Plot:
top_recs <- as(predictions, "list")[[1]]
rec_data <- tibble(Product = top_recs, Score = 1:length(top_recs))
ggplot(rec_data, aes(x = reorder(Product, Score), y = Score)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Top Recommendations for User1 (Collaborative)", x = "Product", y = "Rank") +
theme_minimal()comparison <- tibble(
Product = c(colnames(ratings)[1:2], content_recs[1:2]),
Method = rep(c("Collaborative", "Content-Based"), each = 2),
Rank = rep(1:2, 2)
)
ggplot(comparison,
aes(x = Product, y = Rank, fill = Method)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Recommendation Comparison",
x = "Product",
y = "Rank") +
scale_fill_manual(values = c("#1f77b4", "#ff7f0e")) +
theme_minimal()Task 8.1: Run collaborative plot
Task 8.2: Run comparison plot
Compare methods for StudyApp
Task 8.3: Check differences
tibble(Collaborative = pred_list[[1]], ContentBased = content_recs[pred_list[[1]]])Objective: Use matrix factorization for recommendations
pred_list_svd[[1]]c(UBCF = pred_list_ubcf[[1]], SVD = pred_list_svd[[1]])Objective: Use Latent Dirichlet Allocation for recommendations
# Convert ratings to binary (rated = 1, not rated = 0) for LDA
ratings_binary <- ratings
ratings_binary@data[!is.na(ratings_binary@data)] <- 1
ratings_binary@data[is.na(ratings_binary@data)] <- 0
# Run LDA with 5 topics
lda_model <- LDA(ratings_binary@data, k = 5, control = list(seed = 123))
item_topics <- posterior(lda_model)$topics
# Recommend items with similar topics for User1
user1_topics <- item_topics[1, ]
lda_recs <- colnames(ratings)[order(item_topics[1, ], decreasing = TRUE)[1:2]]lda_recsc(UBCF = pred_list_ubcf[[1]], LDA = lda_recs)Objective: Visualize recommendations, compare methods, discuss societal impact
Plot top recommendations; compare UBCF, SVD, and LDA
Collaborative Filtering Plot:
top_recs_ubcf <- pred_list_ubcf[[1]]
rec_data <- tibble(Product = top_recs_ubcf, Score = 1:length(top_recs_ubcf))
ggplot(rec_data,
aes(x = reorder(Product, Score), y = Score)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Top Recommendations for User1 (UBCF)",
x = "Product", y = "Rank") +
theme_minimal()comparison <- tibble(
Product = c(pred_list_ubcf[[1]], pred_list_svd[[1]], lda_recs),
Method = rep(c("UBCF", "SVD", "LDA"), each = 2),
Rank = rep(1:2, 3)
)
ggplot(comparison,
aes(x = Product, y = Rank, fill = Method)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Recommendation Comparison for User1", x = "Product", y = "Rank") +
scale_fill_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
theme_minimal()top_recs_ubcfcomparisontibble(UBCF = pred_list_ubcf[[1]], SVD = pred_list_svd[[1]], LDA = lda_recs)
Why do methods differ? (UBCF: user similarity, SVD: hidden patterns, LDA: themes)
Societal impacts: Personalization vs. privacy, filter bubbles
How can recommendations improve AI use in society?